Improving Stack Overflow Tag Prediction Using Eye Tracking
نویسنده
چکیده
I) Goals and Purpose Software developers use Stack Overflow to post questions and answers related to programming and computer science problems they need to solve. Questions such as seeking input on some efficient and time-saving methods of coding a particular program, getting help on solving various bottlenecks in coding are commonly seen. When users submit questions on Stack Overflow they need to submit at least one and up to five tags in addition to their question (see Figure 1). These tags attached to each question broadly identify the programming language talked about, the problem type in discussion and maybe some other fine grained categories the question belongs to. The tags associated with each question help with information retrieval or user queries. The goal of this project was to develop a tag prediction system utilizing eye tracking that will improve the accuracy of auto-generated Stack Overflow question tags. These Stack Overflow tags are important because they allow users to be able to further depict a problem within a program, or to precisely answer a programming question when users are on the Stack Overflow network. The main research question for our project is as follows: • To what degree do programmers focus on the keywords that tag extraction techniques generate? Based on the results of the previous question, another follow up study can be done to address the following two questions. In this project however, we only focused on the previous question above. • To what degree do the top n keywords from our approach and the standard approach match our Oracle generated keywords? • What are the best machine learning algorithms that can be successfully used to make such predictions?
منابع مشابه
Anchored Discrete Factor Analysis
We present a semi-supervised learning algorithm for learning discrete factor analysis models with arbitrary structure on the latent variables. Our algorithm assumes that every latent variable has an “anchor”, an observed variable with only that latent variable as its parent. Given such anchors, we show that it is possible to consistently recover moments of the latent variables and use these mom...
متن کامل#ML #NLP: Autonomous Tagging of Stack Overflow Questions
Online question and answer forums such as Stack Exchange and Quora are becoming an increasingly popular resource for education. Central to the functionality of many of these forums is the notion of tagging, whereby a user labels his/her post with an appropriate set of topics that describe the post, such that it is more easily retrieved and organized. We propose a multi-label classification syst...
متن کاملEmbedded Emotion-based Classification of Stack Overflow Questions Towards the Question Quality Prediction
Software developers often ask questions in Stack Overflow Q & A site, and their posted questions sometimes do not meet the standard guidelines. As a consequence, some of the questions are edited by expert users, some of them are down-voted, or some are even deleted permanently. Besides, the users (i.e., developers) might not get the expected solutions for their problems. In this paper, we study...
متن کاملCASE-QA: Context and Syntax embeddings for Question Answering On Stack Overflow
Question answering (QA) systems rely on both knowledge bases and unstructured text corpora. Domain-specific QA presents a unique challenge, since relevant knowledge bases are often lacking and unstructured text is difficult to query and parse. This project focuses on the QUASAR-S dataset (Dhingra et al., 2017) constructed from the community QA site Stack Overflow. QUASAR-S consists of Cloze-sty...
متن کاملComparing vector-based and Bayesian memory models using large-scale datasets: User-generated hashtag and tag prediction on Twitter and Stack Overflow.
The growth of social media and user-created content on online sites provides unique opportunities to study models of human declarative memory. By framing the task of choosing a hashtag for a tweet and tagging a post on Stack Overflow as a declarative memory retrieval problem, 2 cognitively plausible declarative memory models were applied to millions of posts and tweets and evaluated on how accu...
متن کامل